Ontology Based Framework for Web Page Information Extraction

نویسندگان

  • Naveen Gupta
  • Amit Sinhal
چکیده

Nature of Web information is dynamic and irregular that’s why it is difficult to search and integrate information from the Web. The biggest task in making WWW data accessible to users/agents is extracting the data from Web pages. We take advantage of information in existing Web pages to creating structured data semi-automatically. Extraction of information from semi-structured or unstructured documents, such as Web pages, is a useful yet complex task. Research has demonstrated that ontology may be used to achieve a high degree of accuracy in data extraction while maintaining resiliency in the face of document changes. . This paper proposes an ontology-based information extraction system and its application to online book store domain. Testing result shows that this algorithm doesn’t rely on the page structure and it can increase the recall and precision of information extraction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Semantic Extraction from List Web Pages

Extracting structured information from web pages is a problem that has many applications and that gained increased interest in recent years. We propose an approach that can achieve extraction and semantic description of data contained in a list web page. Our approach is fully automatic and is based on a "seed" ontology that contains minimal information about the domain. It uses an instance-base...

متن کامل

An Ontology-Based Extraction Framework for a Semantic Web Application

The Semantic Web vision is rapidly becoming a mainstream reality, but obstacles remain in the way. A major challenge is the adoption of practical Semantic Web applications and the production of vast stores of ubiquitous meta-data which is needed to allow robust inference engines to attain the goals of machine readability of web documents. The authors propose the Semantic Web Applications (SEMWA...

متن کامل

Wpps: a Novel and Comprehensive Framework for Web Page Understanding and Information Extraction

In this paper, we present WPPS, a new, highly configurable Java-based framework for developing efficient and robust methods that address problems in the fields of web page understanding and information extraction. Furthermore, we introduce the representation of a web page as a unified ontological model (UOM), describing its different aspects such as layout, visual features, interface, DOM tree,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013